Tagging Romanian Texts: a Case Study for QTAG, a Language Independent Probabilistic Tagger
نویسندگان
چکیده
This paper describes an experiment on tagging Romanian using QTAG, a parts-of-speech tagger that has been developed originally for English, but with a clear separation between the (probabilistic) processing engine and the (language specific)resource data. This way, the tagger is usable across various languages as shown by successful experiments on three quite different languages: English, Swedish and Romanian. After a brief presentation of the QTAG tagger, the paper dwells on language resources for Romanian and the evaluation of the results. A complexity metrics for tagging experiments is proposed which considers the performance of a tagger with respect to the “difficulty” of a text.
منابع مشابه
Probabilistic tagging of minority language data: a case study using Qtag
While probabilistic methods of part-of-speech tag assignment have long received consideration in corpus and computational-linguistic research, less attention would appear to have been paid to date to the development of tagging accuracy over rounds of iterative, interactive training in applications of these methods. Understanding this aspect of probabilistic tagging is arguably of particular imp...
متن کاملTowards a Bayesian Stochastic Part-Of-Speech and Case Tagger of Natural Language Corpora
This paper introduces and evaluates a Bayesian Network probabilistic model for automatic Part-Of-Speech tagging of Modern Greek natural language texts. The Bayesian model for the task of POS tagging is mathematically formed and is compared to that of Hidden Markov, a broadly applied methodology. Our model is trained from annotated corpora, using lexical as well as contextual information. Unlike...
متن کاملAdapting the TTL Romanian POS Tagger to the Biomedical Domain
This paper presents the adaptation of the Hidden Markov Models-based TTL partof-speech tagger to the biomedical domain. TTL is a text processing platform that performs sentence splitting, tokenization, POS tagging, chunking and Named Entity Recognition (NER) for a number of languages, including Romanian. The POS tagging accuracy obtained by the TTL POS tagger exceeds 97% when TTL’s baseline mod...
متن کاملBayesian Reinforcement for a Probabilistic Neural Net Part-of-Speech Tagger
The present paper introduces a novel stochastic model for Part-OfSpeech tagging of natural language texts. While previous statistical approaches, such as Hidden Markov Models, are based on theoretical assumptions that are not always met in natural language, we propose a methodology which incorporates fundamental elements of two distinct machine learning disciplines. We make use of Bayesian know...
متن کاملسیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی
Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...
متن کامل